Modeling Data using Directional Distributions

نویسنده

  • Inderjit S. Dhillon
چکیده

Traditionally multi-variate normal distributions have been the staple of data modeling in most domains. For some domains, the model they provide is either inadequate or incorrect because of the disregard for the directional components of the data. We present a generative model for data that is suitable for modeling directional data (as can arise in text and gene expression clustering). We use mixtures of von Mises-Fisher distributions to model our data since the von Mises-Fisher distribution is the natural distribution for directional data. We derive an Expectation Maximization (EM) algorithm to find the maximum likelihood estimates for the parameters of our mixture model, and provide various experimental results to evaluate the “correctness” of our formulation. In this paper we also provide some of the mathematical background necessary to carry out all the derivations and to gain insight for an implementation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling data using directional distributions: Part II

High-dimensional data is central to most data mining applications, and only recently has it been modeled via directional distributions. In [Banerjee et al., 2003] the authors introduced the use of the von Mises-Fisher (vMF) distribution for modeling high-dimensional directional data, particularly for text and gene expression analysis. The vMF distribution is one of the simplest directional dist...

متن کامل

Bayesian Modeling of Directional Data with Acoustic and Other Applications

A direction is defined here as a multi-dimensional unit vector. Such unit vectors form directional data. Closely related to directional data are axial data for which each direction is equivalent to the opposite direction. Directional data and axial data arise in various fields of science. In probabilistic modeling of such data, probability distributions are needed which count for the structure ...

متن کامل

Using Weighted Distributions for Modeling‎ Skewed‎, ‎Multimodal and Truncated Data‎

When the observations reflect a multimodal‎, ‎asymmetric or truncated construction or a combination of them‎, ‎using usual unimodal and symmetric distributions leads to misleading results‎. ‎Therefore‎, ‎distributions with ability of modeling skewness‎, ‎multimodality and truncation have been in the core of interest in statistical literature‎, ‎always‎. ‎There are different methods to contract ...

متن کامل

New families of wrapped distributions for modeling skew circular data

Tomasz J. Kozubowski Department of Mathematics, University of Nevada, Reno Abstract: We discuss circular distributions obtained by wrapping the classical exponential and Laplace distributions on the real line around the circle. We present explicit forms for their densities and distribution functions, as well as their trigonometric moments and related parameters, and discuss main properties of t...

متن کامل

Determination of Load and Strain-Stress Distributions in Hot Closed Die Forging Using the Plasticine Modeling Technique

An axisymmetric hot closed die-forging process has been studied by physical modeling technique using the plasticine. To observe the material flow pattern, layers of plasticine with different colors were used. The normal direction to the layers was considered a principal direction. The strain distribution was obtained by measuring the thickness of the plasticine layers. Based on the strain distr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003